Phylogenomics of eukaryotes: impact of missing data on large alignments.
نویسندگان
چکیده
Resolving the relationships between Metazoa and other eukaryotic groups as well as between metazoan phyla is central to the understanding of the origin and evolution of animals. The current view is based on limited data sets, either a single gene with many species (e.g., ribosomal RNA) or many genes but with only a few species. Because a reliable phylogenetic inference simultaneously requires numerous genes and numerous species, we assembled a very large data set containing 129 orthologous proteins ( approximately 30,000 aligned amino acid positions) for 36 eukaryotic species. Included in the alignments are data from the choanoflagellate Monosiga ovata, obtained through the sequencing of about 1,000 cDNAs. We provide conclusive support for choanoflagellates as the closest relative of animals and for fungi as the second closest. The monophyly of Plantae and chromalveolates was recovered but without strong statistical support. Within animals, in contrast to the monophyly of Coelomata observed in several recent large-scale analyses, we recovered a paraphyletic Coelamata, with nematodes and platyhelminths nested within. To include a diverse sample of organisms, data from EST projects were used for several species, resulting in a large amount of missing data in our alignment (about 25%). By using different approaches, we verify that the inferred phylogeny is not sensitive to these missing data. Therefore, this large data set provides a reliable phylogenetic framework for studying eukaryotic and animal evolution and will be easily extendable when large amounts of sequence information become available from a broader taxonomic range.
منابع مشابه
Phylogenomics of Eukaryotes: the impact of missing data on large alignments
Phylogénie, Bioinformatique et Génome, UMR 7622 CNRS, Université Pierre et Marie Curie, 9 quai St Bernard Bât. C– 75005 Paris – France 1 School of Animal and Microbial Sciences, The University of Reading, Whiteknights PO Box 228, Reading RG6 6AJ, United Kingdom. 2 Current address: Department of Biochemistry Dalhousie University, Halifax, Nova Scotia, Canada 3 Department of Zoology, University o...
متن کاملAccounting For Alignment Uncertainty in Phylogenomics
Uncertainty in multiple sequence alignments has a large impact on phylogenetic analyses. Little has been done to evaluate the quality of individual positions in protein sequence alignments, which directly impact the accuracy of phylogenetic trees. Here we describe ZORRO, a probabilistic masking program that accounts for alignment uncertainty by assigning confidence scores to each alignment posi...
متن کاملThe Impact of Missing Data on Species Tree Estimation.
Phylogeneticists are increasingly assembling genome-scale data sets that include hundreds of genes to resolve their focal clades. Although these data sets commonly include a moderate to high amount of missing data, there remains no consensus on their impact to species tree estimation. Here, using several simulated and empirical data sets, we assess the effects of missing data on species tree es...
متن کاملTerrace Aware Phylogenomic Inference from Supermatrices
One approach in phylogenomics to infer the tree of life is based on concatenated multiple sequence alignments from many genes. Unfortunately, the resulting so-called supermatrix is usually sparse, that is, not every gene sequence is available for all species in the supermatrix. Due to the missing sequence information a phylogenetic inference, assuming that each gene evolves with its own substit...
متن کاملPhylogenomics reveals a new 'megagroup' including most photosynthetic eukaryotes.
Advances in molecular phylogeny of eukaryotes have suggested a tree composed of a small number of supergroups. Phylogenomics recently established the relationships between some of these large assemblages, yet the deepest nodes are still unresolved. Here, we investigate early evolution among the major eukaryotic supergroups using the broadest multigene dataset to date (65 species, 135 genes). Ou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Molecular biology and evolution
دوره 21 9 شماره
صفحات -
تاریخ انتشار 2004